Dialect maps and dialect research; useful tools for automatic speech recognition?
نویسندگان
چکیده
Traditional dialect maps are based on data from carefully selected informants which usually results in clear-cut dialect borders, isoglosses, with one dialect characteristic present on one side of the isogloss and absent on the other. We illustrate some of the problems and pitfalls connected with using dialect maps for ASR by comparing results from traditional dialect research with investigations of the Norwegian part of the European SpeechDat database, centred on the two main types of /r/ pronunciation. Our analysis shows that traditional dialect maps and surveys may be of limited use in ASR. To what extent the Norwegian findings have parallels in other countries will depend on two main factors, dialect allegiance vs. a national standard pronunciation and the extent to which the population is sedentary or mobile. Results from traditional dialect research may therefore be more useful in ASR of other languages than Norwegian.
منابع مشابه
Automatic Speech Recognition for Tunisian Dialect
Speech recognition for under-resourced languages represents an active field of research during the past decade. The tunisian arabic dialect has been chosen as a typical example for an under-resourced Arabic dialect. We propose, in this paper, our first steps to build an automatic speech recognition system for Tunisian dialect. Several Acoustic Models have been trained using HMM-GMM and HMM-DNN ...
متن کاملAutomatic Dialect and Accent Recognition and its Application to Speech Recognition
Automatic Dialect and Accent Recognition and its Application to Speech Recognition
متن کاملGaussian Mixture Selection and Data Selection for Unsupervised Spanish Dialect Classification
Automatic dialect classification has gained interests in the field of speech research because it is important to characterize speaker traits and to estimate knowledge that could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous Latin American Spanish dialect classification. The problem considers ...
متن کاملParallel Speech Corpora of Japanese Dialects
Clean speech data is necessary for spoken language processing, however, there is no public Japanese dialect corpus collected for speech processing. Parallel speech corpora of dialect are also important because real dialect affects each other, however, the existing data only includes noisy speech data of dialects and their translation in common language. In this paper, we collected parallel spee...
متن کاملMulti-Dialectical Languages Effect on Speech Recognition
Research has shown that automatic speech recognition (ASR) performance typically decreases when evaluated on a dialectal variation of the same language that was not used for training its models. Similarly, models simultaneously trained on a group of dialects tend to underperform when compared to dialect-specific models. When trying to decide which dialect-specific model (recognizer) to use to d...
متن کامل